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Abstract 

For  natural  language  understanding  systems  designed 
for  domains  including  relatively  complex  equipment, 
it  is  not  sufficient  to  use  general  knowledge  about 
this  equipment.  We  show  problems  which  can  be 
solved  only  if  the  system  has  access  to  a  detailed 
equipment  model.  We  discuss  the  structure  of  such 
models  in  some  detail  and,  in  particular,  the  mixed 
static  dynamic  nature  of  the  model.  As  an  i/lustra' 
tion,  we  describe  parts  of  a  simulation  model  for  an 
air  compressor.  Finally,  we  demonstrate  how  to  find 
referents  in  this  model  for  noun  phrases. 

I.  Introduction  / 

The  work  presented  here  !-  part  of  PROTEUS^  (PROto- 
type  TExt  Understanding  System1,  currently  under  devel¬ 
opment  at  the  Courant  Institute  of  Mathematical  Sciences. 
New  York  University,**  -The  objective  of  our  research  is 
to  understand  short  natural  language  texts,  a^out  equip¬ 
ment.  Our  texts  at  present  are  CASualty  REPorts  (CAS- 
REPs)  which  describe  failures  of  equipment  installed  on  j 
Navy  ships  Our  initial  domain  is  the  starting  air  system 
for  propulsion  gas  turbines.  A  typical  CASREP  consists 
of  several  sentences,  for  example: 

Unable  to  maintain  lube  oil  pressure  to  SAC 
[S/artinj  An  Compressor],  Disengaged  immedi¬ 
ately  after  alarm.  Metal  particles  in  oil  sample 
and  strainer. 

It  is  widely  accepted  among  researchers  that  in  order 
to  create  natural  language  understanding  systems  robust 
enough  for  practical  application,  it  is  necessary  to  pro¬ 
vide  them  with  a  lot  of  common-sense  and  domain-specific 
knowledge.  However,  so  far.  there  is  no  consensus  as  to 
what  is  the  best  way  of  choosing,  organizing  and  using 
such  knowledge. 

The  novelty  of  the  approach  presented  here  is  that, 
besides  general  knowledge  about  equipment,  we  also  use  a 


*Thi»  reif »rch  wu  supported  in  psrt  by  the  Defense  Advanced 
Research  Projects  Agency  under  contrsct  .N 000 1 4- S 5- K - 0 1 6 3  from 
the  Office  of  NavaJ  Research  and  (he  National  Science  Foundation 
under  grant  DCR-85-01443. 

"This  work  it  being  done  in  collaboration  with  Unisys  Defense 
Systems  (formerly  the  System  Development  Cor p.)  as  part  of  (he 
DARPA  Strategic  Computing  Program. 


quite  extensive  simulation  model  for  the  specific  piece  of 
equipment  which  the  texts  deal  with.  \Ye  see  the  following 
merits  of  having  a  simulation  model: 

•  The  model  provides  us  with  a  reliable  background 
against  which  we  can  check  the  correctness  of  the  un¬ 
derstanding  process  on  several  levels;  finding  referents 
of  noun  phrase.,  assigning  semantic  cases  to  verbs,  es¬ 
tablishing  causal  lelationslnps  between  individual  sen¬ 
tences  of  the  text 

•  The  requirements  of  simulation  help  us  to  decide  what 
kind  of  knowledge  about  the  equipment  should  be  in¬ 
cluded  in  the  model,  how  it  could  best  be  organized 
and  which  inferences  it  should  be  possible  to  make. 
It  appears  that  the  information  needed  for  simulation 
largely  coincides  with  that  necessary  for  language  un¬ 
derstanding. 

•  The  ability  to  simulate  the  behavior  of  a  piece  of 
equipment  provides  a  very  nice  verification  method  of 
the  understanding  process  at  the  level  of  interaction 
with  a  user  —  a  dynamic  graphical  interface  provides 
the  user  with  insight  into  the  way  his  input  has  been 
understood  by  the  system. 

In  the  remainder  of  this  paper  we  shall  address  three 
issues:  W  hy  is  a  detailed  equipment  model  needed?  How 
should  such  a  model  be  structured  and,  in  particular,  what 
balance  should  we  strike  between  a  static  model  and  one 
created  dynamically  as  the  text  requires?  How  should  a 
noun  phrase  analyzer  be  organized  to  utilize  such  a  model, 
and  how  is  it  affected  by  this  static/dynamic  balance? 

II.  Need  for  a  Model 

In  many  natural  language  processing  systems,  the  domain 
knowledge  consists  of  general  information  about  the  ob¬ 
jects  and  operations  in  the  domain.  In  the  equipment  do¬ 
main.  this  would  include  knowledge  of  the  possible  states 
and  actions  of  equipment  components,  such  as  valves, 
pumps,  and  gears.  It  is  clear,  however,  that  such  knowl¬ 
edge  is  not  sufficient  for  a  complete  understanding  of  the 
CASREP  messages  wp  are  studying. 

One  feature  of  technical  texts  is  the  heavy  use  of  nom¬ 
inal  compounds.  It  seems  that  their  average  length  is  pro¬ 
portional  to  the  complexity  of  the  discourse  domain.  In 
the  domain  of  the  starting  air  system,  examples  like 


stripped  lube  oil  pump  drive  gear 

are  by  no  means  seldom  occurrences. 

The  pmbhiu  with  nominal  compounds  is  their  ambi¬ 
guity.  Tin  syntactic  analysis  is  of  almost  no  help  here 
Even  using  semantic  (selrctional <  constraint',  as  in  Finiu. 
105C  .  sulistanria!  ambiguity  often  remains.  When  we 
know  that  the  tiominal  compounds  refer  to  objects  existing 
in  the  system,  and  have  access  to  a  model  of  the  system, 
we  can  impose  much  tighter  constraints,  thus  reducing  the 
ambiguity. 

The  need  for  an  equipment  model  is  even  more  evident 
when  we  consider  the  analysis  of  a  multi-sentence  text  such 
a> 

Starting  air  regulating  valve  failed. 

Unable  to  consistently  start  nr  lb  turbine. 

1 1 hi'  !'  tin  exeeipt  Mont  an  actual  CASREP i  In  the  stait 
ins  an  '\ 'te::.  tom  initial  domain)  theie  are  thiee  diffei- 
ent  valves  les’ilatins  starting  air.  Two  question'  mitht  I.. 
po'i-d  n.  connection  with  this  short  text:  1 1  i  which  of  th< 
t hi ee  valves  «•»>  meant  in  the  first  sentence’  (2'  could  the 
failure  of  the  valve  mentioned  in  the  first  sentence  !>.-  the 
cause  of  the  trouble  reported  m  the  second  senteue. 

The  genera!  knowledge  of  equipment  mat'  tell  ti-  a  loi 
about  failures  such  as:  if  a  machinery  element  fails,  then 
it  is  inoperative,  or  if  an  element  is  inoperative,  then  the 
element  of  which  it  is  part  is  probably  inoperative  as  well, 
etc.  Unfortunately,  such  knowledge  is  not  enough  there 
is  no  way  to  answer  these  two  questions  mot  only  for  an 
artificial  understanding  system,  but  even  for  us.  humans) 
without  access  to  rather  detailed  knowledge  about  how 
various  elements  of  the  given  piece  of  equipment  are  inter¬ 
connected  and  how  they  work  as  an  ensemble.  In  our  case 
we  could  hypothesize  (using  general  knowledge  about  te.xi 
structures!  that  there  is  a  causal  relationship  between  the 
facts  stated  in  the  two  sentences.  To  test  this,  we  would 
have  to  consider  each  of  the  three  valves  in  turn  and  check 
how  its  inoperative  state  could  affect  the  starting  of  the 
specific  (i.e.  nr  lb)  turbine.  To  perform  these  tests  we 
would  need  a  simulation  model.  If  one  of  the  three  valves, 
when  inoperative,  would  make  the  turbine  starting  unre¬ 
liable.  then  we  could  claim  that  this  valve  is  the  pioper 
referent  for  the  starting  atr  regulating  valve  mentioned  in 
the  first  sentence.  This  finding  would  let  us  also  answer 
question  (21  affirmatively 

The  above  two  considerations  demonstrate  that  in 
cases  where  the  domain  is  very  specialized  and  complicated 
(a  typical  situation  for  real-life  equipment  I,  language  un¬ 
derstanding  systems  should  be  provided  not  only  with  gen¬ 
eral  knowledge  about  the  equipment  but  also  have  access 
to  its  model. 

III.  PROTEUS  structure 

PROTEUS  consist*  of  a  syntactic  analyzer,  a  semantic 
analyzer  and  a  discourse  analvzei.  The  semantic  an¬ 
alyzer  translates  the  regularized  syntactic  analysis  into 


a  predicate-argument  structure.  As  part  of  the  seman¬ 
tic  analysis,  the  noun  phrase  analyzer  (described  below) 
identifies  elements  of  the  equipment  model  correspond¬ 
ing  to  the  referents  of  noun  phrases.  The  discourse  an¬ 
alyzer  Joskowirz  et  at..  19S7;  .  using  the  equipment  model, 
identifies  implicit  causal  and  temporal  relations  in  the  mes¬ 
sage  (Grishman  et  ai.  lOSCj  describes  the  overall  organi¬ 
zation  of  PROTEUS.  The  system  is  implemented  on  Sym¬ 
bolics  LISP  machines. 

IV.  Simulation  Model 

A.  Structure  of  the  simulation  models 

The  tatget  domains  for  PROTEUS  are  equipment  untie 
l  El  ):  complex  technical  systems  which  accomplish  phys¬ 
ical  tasks  on  demand  These  tasks  are  carried  out  as  se- 
t lei  and  parallel  combination'  of  simpler  tasks,  which  aie 
prrfoiiued  by  constituent  El  s  of  the  main  equipment  unit. 
Often  these  simpler  tasks  can  be  decomposed  further,  lead¬ 
ing  to  a  hierarchy  of  tasks  and  El  s. 

Tiie  Els  transmit  their  effects  through  various  me¬ 
dia.  such  as  gases,  liquids,  mechanical  movement,  and  elec¬ 
tric  current  These  media  travel  from  one  EU  to  another 
through  conduits  appropriate  to  the  different  types  of  me¬ 
dia. 

PROTEUS  models  have  the  structure  of  a  set  of  tran¬ 
sition  networks  They  consist  of  node)  connected  by  di¬ 
rected  links.  The  nodes  correspond  to  the  constituent  El’s 
of  the  system;  the  links  to  the  conduits  connecting  the 
El’s.  The  hierarchical  structure  of  the  El’s  is  reflected 
in  the  hierarchical  structure  of  the  networks.  To  represent 
the  internal  structure  of  an  EU,  we  have  the  corresponding 
node  point  to  another  network  in  the  model. 

Associated  with  each  linn  s  a  working-substance 
(\VS).  These  WSs  correspond  to  the  media  entering  and 
leaving  an  EU  (for  example,  the  rotary  motion  provided  to 
a  pump  and  the  fluid  entering  and  leaving  the  pump).  We 
ran  think  of  the  WSs  associated  with  links  entering  and 
leaving  a  node  as  the  input  and  output  data  of  the  node. 

Associated  with  the  nodes,  links,  and  working  sub¬ 
stances  are  properties,  recording  the  structure,  function 
and  (time-dependent)  state  of  the  system  elements. 

One  design  criterion  for  our  models  is  that  they  make 
possible  the  qualitative  simulation  of  the  modeled  El’s  (in 
most  cases  a  precise  quantitative  simulation  is  not  required 
fot  language  understanding).  The  model  includes  time- 
depenilent  values  for  the  states  of  modeled  components 
and  functions  which  determine  the  state  and  outputs  of 
the  nodes.  Simulation  is  performed  by  an  event-driven 
algorithm  which  is  triggered  by  an  external  event  (such  as 
operator  action  or  reported  failure)  and  continues  until  a 
stable  state  is  reached 

We  have  implemented  the  model  by  defining  proto¬ 
types  for  the  various  types  of  components  (valves,  gear¬ 
boxes.  etc.)  and  then  assembling  a  system  as  a  collection 
of  instances  of  these  prototypes  I  implemented  in  Flavors). 
Information  about  each  type  of  component  is  stored  in  the 


prototype,  so  that  only  information  specific  to  a  particular 
component  need  he  stored  in  the  instance  For  example,  in 
the  c<t'e  of  a  gearbox,  the  information  about  it'  function 
i  'pee.;  .  hang,  i  'liould  be  stored  in  the  prototype,  and  only 
tin  ratio  of  thi'  chance  should  reside  in  the  instance  of  a 
specific  gem  h.  i\  The  library  ' of  prototypes  should  greatly 
simplify  tin  creation  of  new  equipment  models  within  tie 
system 

B.  Level  of  detail 

How  derailed  a  model  should  we  construct''  A  first  re¬ 
sponse  might  he  to  include  everything  which  po'entially 
may  be  referred  to  in  the  reports  This,  however,  seems 
imp!  act  n  ;,l  Consider  a  typical  sentence  from  one  of  the 

pot's 

Investigation  revealed  a  broken  tooth  on  tie-  hub 

ring  gem . 

Considering  that  there  art  several  differi  n'  gear-  in  our 
slatting  ait  system  and  each  of  them  has  many  tee'h  which 
an  vtiy  much  alike,  it's  obvious  that  creating  a  separate 
description  for  each  of  them  wouldn't  be  reasonable.  The 
same  remark  is  true  for  balls  bearings  or  fot  connecting 
elements  like  screws,  bolts  or  pins.  On  the  other  hand, 
information  about  the  tooth  conveyed  in  the  above  sen¬ 
tence  cannot  go  unnoticed  The  solution  we  accepted  for 
such  elements  is  not  to  include  their  descriptions  in  the 
model  on  a  permanent  basis  hut  to  keep  open  the  possi¬ 
bility  to  create  and  add  them  to  the  model  if  ~uch  a  need 
aiises  during  the  analysis.  A  rule  of  thumb  for  deciding 
whether  a  particular  element  deserves  a  permanent  place 
in  the  model  can  be  formulated  as  a  question:  How  much 
information  specific  to  this  element  is  necessary  to  solve 
understanding  problems,  like  finding  referents  or  making 
infeience''.' 

If  there  is  substantial  specific  information,  the  element 
is  included  as  part  of  the  permanent  model  On  the  other 
hand,  if  all  the  needed  information  can  be  derived  from 
the  huger  unit  of  which  this  element  is  a  part  .the  gear  of 
tooth  oh  gtor\.  and  this  larger  unit  is  always  mentioned  in 
dc»ciiprjnn«  of  the  part,  then  the  element  can  be  added  to 
tin  mode!  dvnamicallv  when  it  is  mentioned  in  a  message. 
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Figure  1:  The  Lube  Oil  5y»tem 

V.  Noun  Phrase  Analysis 

A.  The  role  of  noun  phrase  analysis 

The  goal  of  the  Noun  Phrase  Analyzer  (N"PA)  is  to 
convert  a  noun  phtase  into  a  set  of  refeienrs  that  may  be 
used  by  subsequent  stages  of  the  system.  To  accomplish 
this  it  uses  the  equipment  model  two  ways.  First,  the 
model  is  used  to  Confirm  possible  relations  between  noun 
phrase  constituents  (foi  example,  that  there  is  a  gear  which 
is  an  adjacent  part  to  a  pump).  Second,  the  model  provides 
a  set  of  referents  for  the  phrase. 

The  N’PA  converts  from  the  linguistic  representation 
to  one  in  terms  of  domain  predicates.  The  interface  be¬ 
tween  the  XPA  and  the  model  is  called  the  Model  Query 
Processor  (MQP).  The  MQP  evaluates  the  domain  predi¬ 
cates  relative  to  the  equipment  model,  and  creates  internal 
representations  for  El's  when  needed  Major  equipment 
units  are  part  of  the  static  model:  others  must  he  created 
dynamically. 


C.  An  example:  the  starting  air  system 

A*  out  initial  domain  we  have  chosen  the  "starting  air 
'tem”  used  for  starting  ga'  turbines  on  Navy  ship'.  The 
model  roii'i't '  of  15  networks  with  a  total  of  about  175 
node'.  One  of  these  networks  is  shown  in  Fig  1  This 
figure  i'  a  Symbolic*  screen  image  generated  by  PROTEUS 
flout  the  model  networks.  Some  parts  of  the  display  are 
dynamic,  gears  rotate,  oil  moves  as  visible  particles,  etc. 
Thi'  provide'  a  direct  visual  presentation  of  the  *y«tnn's 
mu  If  l 'landing  of  a  ine>sagc  mil  may  stop  flowing  m  a  geni 
rotating..  The  dynamic  displays  are  achieved  a-  a  side 
effect  nf  the  simulation  used  for  understanding  pm  poses. 


B.  The  Analysis  Procedure 

The  NPA  fetches  a  semantic  cla-s  and  various  other  fea¬ 
tures  for  each  word  in  the  noun  phrase.  Constituents  are 
combined  bottom-up  based  on  a  set  of  rule'  stated  in  terms 
of  these  classes.  These  rules  identify  relationships  of  the 
form  (Pred  Arg-1  Arg-2  ...)  where  Peril  is  a  predicate 
of  the  domain  and  the  Arg- 1  are  constituents  of  the  noun 
phrase.  We  have  analyzed  a  corpus  of  35  sentences  and 
identified  a  set  of  13  predicates  for  analyzing  noun  phrases: 

adjacent-to  alarm  couple  drive  lube  made-of  mea¬ 
sure  name  operate-on  part -of  regulate  start  loca¬ 
tion 


Most  current  systems  validate  the  application  of  these 
rules  through  such  selectional  checks  (constraints  on  the 
argument  classes  of  each  predicate'  [Finin.  19SG]  We  per- 
foin.  'itch  check',  then  go  a  step  further  and  check  for  the 
eM-teuce  of  the  specific  relation  between  specific  entities. 
Til!'  I'  done  through  the  MQP  The  MQP  Is  first  invoked 
fo:  each  void  to  obtain  its  internal  representation  in  the 
model,  and  then  for  each  proposed  predicate  to  verify  the 
existence  of  the  corresponding  relation  in  the  model  If  the 
relation  exists,  the  MQP  returns  the  representation  for  the 
head  constituent  of  the  relation. 

For  post -nominal  modifiers  the  predicate  is  strongly 
indicated  by  the  preposition,  and  arguments  itlie  head 
noun  and  tin-  object  of  the  preposition  are  explicit  and 
delimited  i  Pro-nominal  modifiers  are  more  difficult.  The 
pioMim  i-  to  decide  rvhar  predicate-  should  be  used  and 
v.i.c ’  ate  tin  arg  imen’scif  these  predicates  Both  the  pred¬ 
icate  -  as  well  a-  then  arguments  max  be  given  explicitly  or 
implicitly  Examples  are.  1  *  *». p •  ro t 11 »  naulcting  value  -  the 
pied:*  ate  an*!  both  it-  ai gun.e:P»  are  explicit  n  dr w  gun 
‘the  J'redicate  and  one  of  its  argument-  are  explicit,  the 
other  argument,  the  object  of  DRIVE,  is  implicit),  pt imp 
.'haft  '  the  predicate.  PART-OF.  i-  implicit,  both  of  it-  ar¬ 
gument'  are  explicit  I.  The  NPA  con-iders  the  semantic 
feature-  of  the  items,  together  with  order  constraints,  to 
match  the  item-  with  arguments  of  some  canonical  predi¬ 
cate  A  match  is  considered  successful,  if  it  is  possible  to 
identify  some  (not  necessarily  all )  of  the  arguments  of  the 
piedicate  among  the  modifiers  For  verification  purpose-, 
it  l-  a-sumi  d  that  the  empty  arguments  match  anything. 
Once  a  matching  canonical  predicate  has  been  found  and 
a>  many  of  their  arguments  matched  with  the  modifiers  as 
possible,  the  NPA  poses  a  verification  query  to  the  model. 

In  many  cases  the  nour.  phrase  does  not  determine 
a  unique  entity,  so  a  set  is  returned.  Context  and  de¬ 
fault  information  are  used  to  resolve  such  references.  This 
t-  handled  by  the  Reference  Resolution  Module,  following 
[Palmei  it  id  .  1936]. 

C.  Model  Query  Processor 

We  have  observed  the  mixed  static/dvnamic  characteristic 
of  our  H-presentation.  Since  the  other  modules  are  de¬ 
signed  to  be  independent  of  the  model  representation,  tin- 
ili-f motion  must  he  hidden  by  the  MQP.  We  discuss  here 
"id.'  queries  posed  by  the  NPA.  i.e.  requests  for  the  rep- 
leseiitation  of  a  word,  and  predicate  verification  queries. 

The  main  El’s  are  recorded  permanently  in  the  equip¬ 
ment  model  For  words  corresponding  to  these  El’s,  the 
MQP  contains  pointers  to  all  the  nodes  in  the  model  to 
which  the  word  inav  refer  (for  example,  the  entry  for  pump 
will  point  to  all  pumps  in  the  model  h  However,  two  classes 
*>f  El  -  are  created  dynamically  bv  the  MQP.  The  fir-t  cla— 
••oii'i't-  <>f  components  too  small  to  justify  including  in  tin- 
model  leg  connecting  pin  in  pump  dmn  assembly  or  tooth 
nf  hub  iimr.i  The  second  clas-  involve-  aggregates  of  el¬ 
ements  which  are  described  in  the  text  anti  treated  a-  a 
unit  bm  <lu  not  correspond  to  a  'ingle  unit  in  the  model 


hierarchy.  There  are  several  examples  in  our  corpus,  such 
as  the  coupling  from  diesel  to  SAC  lubi  oil  pump 

The  predicate  verification  queries  nni't  also  distin¬ 
guish  between  static  and  dynamical!'  created  El'S  Where 
all  the  argument-  are  statically  modeled,  the  MQP  need 
only  check  the  model  attribute  corresponding  to  tbe  pred¬ 
icate.  When  dynamically  created,  tin  MQP  will  generally 
have  to  modify  theargument  to  reflect  the  constraint  ex 
pressed  by  the  predicate  For  example,  in  verifying  tpart- 
of  clutch  SAC i.  where  both  clutch  and  SAC  are  statically 
modeled,  we  check  the  PART-OF  role  of  clutch  Whereas, 
in  verifying  tpari-of  tooth  gear),  with  tnnth  dynamically 
created,  we  fill  the  PART-OF  role  of  tooth. 

VI.  Future  Work 

In  addition  to  noun  phia-e  analy-i-.  the  equipment  model 
is  used  heavily  in  di-course  analysis  -  identifying  implicit 
causal  and  temporal  relation-  We  have  developed  a  pre- 
liminaiy  implementation  "f  di-cour-t  analy-i-  Joskowioz 
ct  al .  1937  but  much  work  remain-  to  be  done,  particu 
larlv  regarding  time  dependencies.  We  also  intend  to  de¬ 
velop  tool-  for  the  more  efficient  acquisition  of  equipment 
models,  and  to  study  technique-  for  dealing  with  gap-  in 
the  domain  model 

The  initial  motivation  for  PROTEUS  was  text  un¬ 
derstanding  for  subsequent  querying,  summarization,  and 
trend  analysis  Our  use  of  a  detailed  equipment  model 
similar  to  that  employed  in  simulation  systems  (eg. 
STEAMER[Hol)an  et  al  .  19S-4  1  and  diagnostic  systems 
suggests  that  PROTEl’S  is  also  useful  a-  an  interface  to 
such  systems. 
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